由于问题过度问题,大多数现有的图形神经网络只能使用其固有有限的聚合层捕获有限的依赖性。为了克服这一限制,我们提出了一种新型的图形卷积,称为图形隐式非线性扩散(GIND),该卷积隐含地可以访问邻居的无限啤酒花,同时具有非线性扩散的自适应聚集特征,以防止过度张开。值得注意的是,我们表明,学到的表示形式可以正式化为显式凸优化目标的最小化器。有了这个属性,我们可以从优化的角度从理论上表征GIND的平衡。更有趣的是,我们可以通过修改相应的优化目标来诱导新的结构变体。具体而言,我们可以将先前的特性嵌入到平衡中,并引入跳过连接以促进训练稳定性。广泛的实验表明,GIND擅长捕获长期依赖性,并且在具有非线性扩散的同粒细胞和异性图上表现良好。此外,我们表明,我们模型的优化引起的变体可以提高性能并提高训练稳定性和效率。结果,我们的GIND在节点级别和图形级任务上都获得了重大改进。
translated by 谷歌翻译
Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document textline, as an intrinsic granularity in VDU, has seldom been explored so far. A document textline usually contains words that are spatially and semantically correlated, which can be easily obtained from OCR engines. In this paper, we propose Wukong-Reader, trained with new pre-training objectives to leverage the structural knowledge nested in document textlines. We introduce textline-region contrastive learning to achieve fine-grained alignment between the visual regions and texts of document textlines. Furthermore, masked region modeling and textline-grid matching are also designed to enhance the visual and layout representations of textlines. Experiments show that our Wukong-Reader has superior performance on various VDU tasks such as information extraction. The fine-grained alignment over textlines also empowers Wukong-Reader with promising localization ability.
translated by 谷歌翻译
Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. However, to our knowledge, few-shot image generation tasks have yet to be studied with DDPM-based approaches. Modern approaches are mainly built on Generative Adversarial Networks (GANs) and adapt models pre-trained on large source domains to target domains using a few available samples. In this paper, we make the first attempt to study when do DDPMs overfit and suffer severe diversity degradation as training data become scarce. Then we propose to adapt DDPMs pre-trained on large source domains to target domains using limited data. Our results show that utilizing knowledge from pre-trained DDPMs can significantly accelerate convergence and improve the quality and diversity of the generated images. Moreover, we propose a DDPM-based pairwise similarity loss to preserve the relative distances between generated samples during domain adaptation. In this way, we further improve the generation diversity of the proposed DDPM-based approaches. We demonstrate the effectiveness of our approaches qualitatively and quantitatively on a series of few-shot image generation tasks and achieve results better than current state-of-the-art GAN-based approaches in quality and diversity.
translated by 谷歌翻译
我们提出了Pangu-Coder,这是一种仅预读的解码器语言模型,该模型采用pangu-alpha架构进行文本到代码生成,即给定自然语言问题描述的编程语言解决方案的合成。我们使用两阶段策略训练Pangu-Coder:第一阶段采用因果语言建模(CLM)来预先培训原始编程语言数据,而第二阶段则使用因果语言建模和掩盖语言建模(MLM)的组合培训目标,专注于文本到代码生成的下游任务,并培训松散的自然语言程序定义和代码功能。最后,我们讨论了pangu-coder-ft,该pander the是通过竞争性编程问题和代码与持续集成测试的结合进行了微调的。我们评估了pangu-coder,重点是它是否生成功能上正确的程序,并证明它在参加较小的上下文窗口和较少的数据培训的同时,它比诸如Codex之类的类似大小的模型(例如Codex)实现等效性或更好的性能。
translated by 谷歌翻译
辐射脑病(REP)是鼻咽癌(NPC)放疗最常见的并发症。非常希望协助临床医生优化NPC放射疗法方案,以减少放射疗法诱导的颞叶损伤(RTLI),该疗程根据REP发作的可能性。据我们所知,这是通过在NPC放射治疗方案中共同利用图像和非图像数据来预测放疗诱导的REP的首次探索。我们将代表预测作为生存分析任务,并根据一致性指数(CI)评估预测准确性。我们设计了一个深层多模式生存网络(MSN),该网络(MSN)具有两个特征提取器,以从多模式数据中学习判别特征。一个功能提取器在非图像数据上施加特征选择,另一个功能提取器从图像中学习视觉特征。因为直接使CI最大化的CI(BCI)损耗函数对每批采样不均匀。因此,我们提出了一种新型的加权CI(WCI)损失函数,以通过双平均操作分配其不同的权重有效地利用所有REP样本。我们进一步引入了WCI温度高参数,以增强样本对的风险差异,以帮助建模收敛。我们在私人数据集上广泛评估WCI,以证明其对同行的可爱性。实验结果还表明,NPC放射疗法的多模式数据可以为REP风险预测带来更多收益。
translated by 谷歌翻译
在肺结节的管理中,我们希望根据其在计算机断层扫描(CT)扫描的直径变化方面预测结节的演变,然后根据结节不断增长的趋势的预测结果提供后续建议。为了提高肺结节增长趋势预测的性能,与连续CT扫描中相同结节的变化进行比较至关重要。在此激励的情况下,我们从国家肺筛查试验(NLST)数据集进行了两次以上的CT扫描,筛选了4,666名受试者,以组织一个名为NLSTT的颞数据集。在具体上,我们首先检测并配对感兴趣的区域(ROI),该区域涵盖了基于注册的CT扫描的相同结节。之后,我们通过模型预测结节的纹理类别和直径大小。最后,我们根据直径的变化来注释每个结节的演化类别。基于构建的NLSTT数据集,我们建议一个暹罗编码器同时利用从连续的CT扫描中检测到的3D ROI的判别特征。然后,我们在新小时设计一个时空混合器(STM)来利用连续3D ROI中同一结节的间隔变化,并捕获结节区域的空间依赖性和当前的3D ROI。根据临床诊断常规,我们采用层次损失来更多地关注生长的结节。我们有组织的数据集上的广泛实验证明了我们提出的方法的优势。我们还对内部数据集进行了实验,以通过将其与熟练的临床医生进行比较来评估我们方法的临床实用性。
translated by 谷歌翻译
语音转换是一项常见的语音综合任务,可以根据特定的现实情况来以不同的方式解决。最具挑战性的人通常被称为单一镜头多次的语音转换是在最一般的情况下,从一个参考语音中复制目标语音,而源和目标扬声器都不属于培训数据集。我们提出了一种基于扩散概率建模的可扩展高质量解决方案,与最新的单发语音转换方法相比,它表现出了优质的质量。此外,我们专注于实时应用程序,我们研究了可以更快地使扩散模型的一般原则,同时将合成质量保持在高水平。结果,我们开发了一种新型的随机微分方程求解器,适用于各种扩散模型类型和生成任务,如经验研究所示,并通过理论分析证明了它。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译